Integrative identification of hub genes in development of atrial fibrillation related stroke

Background As the most common arrhythmia, atrial fibrillation (AF) is associated with a significantly increased risk of stroke, which causes high disability and mortality. To date, the underlying mechanism of stroke occurring after AF remains unclear. Herein, we studied hub genes and regulatory pathways involved in AF and secondary stroke and aimed to reveal biomarkers and therapeutic targets of AF-related stroke. Methods The GSE79768 and GSE58294 datasets were used to analyze AF- and stroke-related differentially expressed genes (DEGs) to obtain a DEG1 dataset. Weighted correlation network analysis (WGCNA) was used to identify modules associated with AF-related stroke in GSE66724 (DEG2). DEG1 and DEG2 were merged, and hub genes were identified based on protein–protein interaction networks. Gene Ontology terms were used to analyze the enriched pathways. The GSE129409 and GSE70887 were applied to construct a circRNA-miRNA-mRNA network in AF-related stroke. Hub genes were verified in patients using quantitative real-time polymerase chain reaction (qRT-PCR). Results We identified 3,132 DEGs in blood samples and 253 DEGs in left atrial specimens. Co-expressed hub genes of EIF4E3, ZNF595, ZNF700, MATR3, ACKR4, ANXA3, SEPSECS-AS1, and RNF166 were significantly associated with AF-related stroke. The hsa_circ_0018657/hsa-miR-198/EIF4E3 pathway was explored as the regulating axis in AF-related stroke. The qRT-PCR results were consistent with the bioinformatic analysis. Conclusions Hub genes EIF4E3, ZNF595, ZNF700, MATR3, ACKR4, ANXA3, SEPSECS-AS1, and RNF166 have potential as novel biomarkers and therapeutic targets in AF-related stroke. The hsa_circ_0018657/hsa-miR-198/EIF4E3 axis could play an important role regulating the development of AF-related stroke.

a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 expression profile GSE58294 was performed on Platform GPL570 containing plasma samples from patients with cardioembolic stroke (n = 69) and healthy controls (n = 23). Subjects were analyzed at three time points: less than 3 h, 5 h, and 24 h after stroke. The mRNA expression profile data GSE66724 was based on peripheral blood cells in eight patients with AF and stroke and eight AF subjects without stroke (same anticoagulation and arrhythmia therapy). The miRNA dataset GSE70887 was performed on Platform GPL19546, containing LA biopsies of two sinus rhythm and four persistent AF patients. The circRNA dataset GSE129409 was performed on GPL21825 with heart tissues from three AF patients and three healthy controls. The flowchart of this study is shown in Fig 1.

Data screening strategy
The limma package in R [12] was applied to evaluate all the five matrix datasets. The Bayesian method was used to correct for batch effects. If more than one probe mapped to the same gene, the average expression value was used to equal its expression value. The Benjamini-Hochberg method was used to adjust original p-values, and the false discovery rate procedure was used to calculate fold changes (FC). The DEGs were screened by P < 0.05 and log FC > average (abs(log FC)) + 2 � SD (abs(log FC)) [13]. The expression values of the |log FC| > 0.618 and adjusted P < 0.05 were used for filtering AF-DEGs. The |log FC| > 0.62 and P < 0.05 were used to identify stroke-DEGs. The |log FC| > 0.273 and P < 0.05 were used to identify AF-related stroke-DEGs. As per DEMis and DECircs, the cutoff values for log FC were 1.14 and 2.09, respectively. Volcano maps and heatmaps were also drawn by the plot and pheatmap package in the R studio to exhibit all DEGs.

Construction of a weighted gene co-expression network and identification of significant modules
The co-expression network of DEGs was constructed based on the GSE66724 microarray dataset using the R package "weighted correlation network analysis (WGCNA) [14]." The softthresholding power was set to 5 when 0.85 was used as the correlation coefficient threshold, and 50 was selected as the minimum number of genes in the modules. To merge possibly similar modules, we defined 0.25 as the threshold for cut height. Identification of hub genes and efficacy evaluation DEG1 was confirmed by intersecting AF-DEGs and stroke-DEGs. Then, genes in the module of interest by WGCNA were intersected with AF-related stroke-DEGs, and DEG2 was obtained. Thereafter, DEG1 and DEG2 were merged to obtain AF-related stroke genes (DEG3). The venn plot of these three DEG datasets were showed in S1 Fig. The PPI network of DEG3 were showed in S2 Fig. The ROC curve was then plotted and the AUC was calculated to evaluate the capability of selected genes to distinguish AF-related stroke patients and AF patients without stroke. Only genes with an AUC > 0.8 were considered hub genes. The comparative toxicogenomic database (http://ctdbase.org/) was used to analyze possible relationships between hub genes and nervous or cardiovascular diseases (Fig 1).

Functional enrichment analysis
Functional annotation of Gene Ontology (GO) [15,16] was performed with the ClusterProfiler package [17] in R studio. GO terms (biological processes, molecular functions, and cellular components) with a P < 0.05 were considered significantly enriched. Subsequently, the AmiGO database (v2.0; http://amigogeneontology.org/amigo/) was used to analyze the GO consortium for filtered hub genes, verify their accuracy, and annotate their biological functions.

Protein-protein interaction (PPI) networks
To calculate the interactions between molecules in AF-DEGs, stroke-DEGs, and WGCNA modules, the online database STRING [18] (https://string-db.org/) was applied with a confidence score > 0.4. Cytoscape software (Version 3.7.2; http://cytoscape.org/) was used to visualize and analyze the biological networks and node degrees [19].

Construction of circRNA-miRNA-mRNA network for AF-related stroke
Three online databases, including miRDB (http://www.mirdb.org), miRTarBase (http:// mirtarbase.mbc.nctu.edu.tw), and TargetScan (http://www.targetscan.org), were used to predict the targets of DEMis. The targeted genes fulfilling the qualification of at least two databases were selected. Predicted genes were further filtered by matching the hub genes selected above. Then, the miRNA-mRNA pairs were determined. The miRNA targets of DECircs were predicted using the online database circBank (http://www.circbank.cn/), and they were validated by the DEMis selected above. Then, the circRNA-miRNA pairs were determined. Cytoscape was used to visualize this circRNA-miRNA-mRNA network.

Blood sample collection and quantitative real-time polymerase chain reaction
AF patients with or without stroke in the Huashan Hospital were included in this study. The exclusion criteria were coronary artery heart disease, hypertension, diabetes, and obstructive sleep apnea syndrome. Blood samples from three AF patients with stroke and three AF patients without stroke were collected. Samples were immediately preserved in RNALock Reagent (TIANGEN, Beijing, China) to inhibit RNA degradation. This study was approved by the Medical Ethics Committee of Huashan Hospital, Fudan University. All patients participating in this study provided written informed consent before blood collection.
The total RNA was extracted from blood samples with an RNAprep pure Blood Kit (TIANGEN, Beijing, China). Total RNA was quantified with a NanoDrop spectrophotometer 2000 (Thermo Fisher Scientific, Waltham, MA, USA) and then reversely transcribed into cDNA using random primers with the PrimeScript™ RT Reagent Kit (TaKara, Dalian, China). A quantitative real-time polymerase chain reaction (qRT-PCR) was carried out with TB Green™ Premix Ex Taq™ (TaKara, Dalian, China) on a CFX-96 real-time PCR System (Bio-Rad, Shanghai, China). Expression data was normalized by GAPDH, and the 2 −ΔΔCT method was applied to analyze the results. All sequences for RNA primers were purchased from Sangon Biotech, Shanghai, China.

Heatmaps of the top 30 AF-DEGs and stroke-DEGs are shown in
The volcanic diagram of all genes and the expression heatmap of the top 30 DEGs in GSE66724 (S3A Fig, Fig 3A), GSE70887 (S3B Fig, Fig 3B), and GSE129409 (S3C Fig, Fig 3C).

Construction of co-expression networks and identification of key modules
A hierarchical clustering tree was created based on the dynamic hybrid cut with a scale-free network and topological overlaps (Fig 4A). Based on the scale-free topology criterion, a softthresholding power of 5 was selected (scale-free R 2 = 0.85; Fig 4B and 4C). Five modules were identified for further analysis. The cluster dendrogram of the modules is shown in Fig 4D, while the clustering of module eigengenes is provided in Fig 4E. Moreover, we analyzed the association of gene modules by comparing AF with stroke and AF without stroke (Fig 5A). The turquoise module showed the highest positive correlation (r = 0.34). Therefore, we identified the turquoise module as the key one for further analysis. A total of 432 genes were included in the turquoise module (S6 Table). Moreover, we illustrated in turquoise the module membership and gene significance for AF with stroke (correlation coefficient = 0.32, P < 0.001) (Fig 5B). In addition, genes in the turquoise module were overlapped with DEGs in
We performed interactomics and GO functional enrichment analysis on DEG3 based on GeneMania (http://genemania.org/). As shown in Fig 7A, hydrolase activity, secretory granule lumen, azurophil granule and primary lysosome were the main enriched terms.

Validation of hub genes of AF-related stroke
Based on the results of DEG3 and ROC curves in GSE66724 and GSE58294 (Fig 7D and 7E), we filtered eight hub genes of AF-related stroke (AUC > 0.8), including EIF4E3, ZNF595, ZNF700, MATR3, ACKR4, ANXA3, SEPSECS-AS1, and RNF166. The database showed that hub genes targeted several nervous system and cardiovascular diseases (Fig 8A-8H). GO term enrichment related to biological processes, molecular functions, and cellular components of hub genes were associated with various processes, as indicated in Table 1.

Reconstruction of the circRNA-miRNA-mRNA network in AF-related stroke
As shown in Fig 8I, a circRNA-miRNA-mRNA regulatory network bearing ten circRNAs, eight miRNAs, and one mRNA was constructed to demonstrate pathophysiologic mechanisms in AF-related stroke. The parameters of degree, closeness, and betweenness in the network were calculated by the plugin cytoHubba in Cytoscape. The top 5 nodes were hsa-miR-33a-3p, hsa_circ_0000444, hsa-miR-452-3p, hsa-miR-198, and hsa_circ_0018657. EIF4E3 is the target of hsa-miR-198, and hsa-miR-198 is the target of hsa_circ_0018657, suggesting that hsa_-circ_0018657/hsa-miR-198/EIF4E3 could be an important pathway regulating the development of AF-related stroke.

Experimental validation of hub genes
The expression levels of eight hub genes, miR-198, and hsa_circ_0018657 were detected in blood samples by qRT-PCR (Fig 9A-9J). Results showed that the expression of EIF4E3, ANXA3, and hsa_circ_0018657 was significantly higher in AF-related stroke patients than those in AF patients without stroke, which was consistent with the bioinformatic analysis. The expression of MATR4, ACKR4, RNF166, and miR-198 in AF-related stroke patients was lower than those in AF patients without stroke.

Discussion
As a serious public health concern, AF has shown an increasing incidence and prevalence in the elderly population, which is associated with elevated risks of cerebrovascular disease events and deaths [20][21][22]. Discriminating individuals with AF who are prone to develop atrial mural thrombus and cardiogenic stroke is of great concern. Despite continuous cardiac rhythm monitoring, around 30% of patients show no obvious signs before the occurrence of cerebrovascular events. Thus, identifying biomarkers of AF-related stroke and elucidating the relationship between AF and embolic events may provide novel therapeutic targets for primary care [23].

PLOS ONE
In this study, a series of bioinformatics analyses were carried out to filter hub genes of AFrelated stroke. We first took the overlap of DEGs from AF and stroke patients to obtain DEG1. Then, WGCNA was used to identify key modules associated with AF-related stroke to obtain DEG2. The overlap of DEG1 and DEG2 (DEG3) was considered to include important genes in the development of AF-related stroke. Then, the PPI network of AF-and stroke-related DEGs was constructed, and we filtered eight hub genes based on ROC analyses. Those hub genes could facilitate the prevention of AF-related stroke and provide novel therapeutic targets. We also investigated the biological processes, cellular components, and molecular functions of the DEGs, revealing that these genes are significantly associated with the regulation of the BMP signaling pathway, smooth muscle cell proliferation, and extracellular matrix disassembly. Finally, we constructed a circRNA-miRNA-mRNA network of AF-related stroke and screened out the hsa_circ_0018657/hsa-miR-198/EIF4E3 axis as an important regulatory pathway in the development of AF-related stroke. These results were further validated in our qRT-PCR experiments.
In the present study, EIF4E3 was explored as a potential molecular signature for AF patients with a high probability of stroke occurrence, which may play an important role in the development of AF-related stroke. IF4E3 belongs to the EIF4E family as a translational initiation factor that interacts with the 5-prime cap structure of mRNA and recruit mRNA to the ribosome. Interestingly, although considered to regulate the nervous system directly or indirectly, EIF4E3 also has a role in the cardiovascular system. Mrvová, S and colleagues performed bioinformatics analyses of expressed sequence tags and the 3'-UTRs of the main transcript splice variants of the translational initiation factor EIF4E3 and showed that EIF4E3 mRNAs have a great potential in heavy post-transcriptional regulation [24]. EIF4E3 truncated transcript variants were mainly found in the brain. However, EIF4E3 also promotes angiogenesis in the region surrounding myocardial infarction [25].
Along with genetic susceptibility, thrombus vulnerability is another main reason for AFrelated stroke. miRNAs have also been proposed as potential biomarkers for vulnerable plaques. Hoekstra, M et al. explored peripheral blood mononuclear cell microRNA profiles from coronary artery disease patients and found that miR-198 exhibited a high expression level in unstable angina pectoris patients compared with such levels in stable patients [26]. Sepramaniam, S and colleagues performed a miRNA microarray of peripheral blood samples from acute stroke patients and healthy controls, presenting that miR-198 was dysregulated in stroke patients [27], which is consistent with our results. Based on previous studies, we believe that miR-198 could play an important role in the development of AF-related stroke by binding to the 3'UTR of EIF4E3.
As the host gene of hsa_circ_0018657, EIF4EBP2 was also found to participate in brain dysfunction. Martín-Flores, N showed the significant association of SNP rs1043098 in the EIF4EBP2 gene with the onset of dyskinesia induced by L-DOPA administration [28]. In multiple sclerosis patients, EIF4EBP2 expression was downregulated compared to those in healthy controls [29]. Thus, hsa_circ_0018657 could also play a part in cerebral disease, which needs further verification. As a miRNA sponge, hsa_circ_0018657 could regulate miRNA-198 by targeting EIF4E3. The role of this axis in the development of AF-related stroke deserves further exploration.
ZNF595 and ZNF700 belong to the zinc finger protein family, whose members function as transcription factors that can regulate a broad variety of developmental and cellular processes. In schizophrenia (SCZ) patients, nonsense de novo mutations of ZNF595 are common. Data from genome-wide association studies suggested that common variants in the ZNF595 gene may be associated with SCZ and SCZ-related traits [30]. Pathogenic mutations in selenocysteine synthase (SEPSECS) cause neurodevelopmental disorders [31][32][33]. While using sequencing data analysis, Laan, L identified regional epigenetic changes in the transcription factor gene ZNF700, which is relevant in Down syndrome brain development, providing a novel framework for further studies on epigenetic changes and transcriptional dysregulation during chromosome 21 neurogenesis [34].
The MATR3 gene encodes a nuclear matrix protein, which is proposed to stabilize certain mRNA species. Previous studies proved that mutations in MATR3 cause hereditary amyotrophic lateral sclerosis [35][36][37][38][39][40]. In a genome-wide association study using memory performance in a cohort of elderly individuals (>60years), MATR3 was significantly associated with neuronal development, synaptic plasticity, and memory-related processes [41]. In addition, the 3' UTR of MATR3 encodes the nuclear matrix protein MATR3, which is strongly expressed in the neural crest, developing heart, and great vessels [42]. Thus, subtle perturbations in MATR3 expression appear to cause similar left ventricular outflow tract defects in humans and mice. ACRK4 is a member of the G protein-coupled receptor family and is a receptor for C-C type chemokines. ACKR4 binds the homeostatic chemokines CCL19, CCL21, CCL25, and CXCL13 and has been attributed scavenging properties [43]. The expression of ACKR4 was upregulated in the border/infarct area after myocardial infarction, and knocking out ACKR4 protected against adverse ventricular remodeling in mice post-infarction, indicating that ACKR4 may be a novel therapeutic target to ameliorate cardiac remodeling [44]. ANXA3 is recognized as a regulator of cerebral ischemia/reperfusion injury. The upregulation of ANXA3 could promote cell viability, decrease cell apoptosis, and reduce the production of inflammatory cytokines in neurons after oxygen-glucose deprivation [45]. The silencing of ANXA3 would promote repair and healing of the myocardium after infarction by the activation of the PI3K/Akt signaling pathway [46]. Thus, there may be a relationship between cardiovascular and nervous system diseases, arising from loci mutations or gene variants [23]. This study also has several limitations. First, this study was based on microarray analysis, and gene expression may be not directly equivalent to protein expression. Second, data for the analysis of AF-related stroke mainly refers to persistent AF patients. Although persistent AF was most hazardous in stroke cases, other AF forms should be studied in the future. Finally, although we performed qRT-PCR to verify the expression levels of genes, more in vitro and in vivo experiments should be carried out to validate our results.